Domain-Independent User Satisfaction Reward Estimation for Dialogue Policy Learning

نویسندگان

  • Stefan Ultes
  • Pawel Budzianowski
  • Iñigo Casanueva
  • Nikola Mrksic
  • Lina Maria Rojas-Barahona
  • Pei-hao Su
  • Tsung-Hsien Wen
  • Milica Gasic
  • Steve J. Young
چکیده

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we propose to use a reward based on user satisfaction. We will show in simulated experiments that a live user satisfaction estimation model may be applied resulting in higher estimated satisfaction whilst achieving similar success rates. Moreover, we will show that one satisfaction estimation model which has been trained on one domain may be applied in many other domains which cover a similar task. We will verify our findings by employing the model to one of the domains for learning a policy from real users and compare its performance to policies using the user satisfaction and task success acquired directly from the users as reward.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Reinforcement learning for parameter estimation in statistical spoken dialogue systems

Reinforcement techniques have been successfully used to maximise the expected cumulative reward of statistical dialogue systems. Typically, reinforcement learning is used to estimate the parameters of a dialogue policy which selects the system’s responses based on the inferred dialogue state. However, the inference of the dialogue state itself depends on a dialogue model which describes the exp...

متن کامل

Reward Estimation for Dialogue Policy Optimisation

Viewing dialogue management as a reinforcement learning task enables a system to learn to act optimally by maximising a reward function. This reward function is designed to induce the system behaviour required for the target application and for goal-oriented applications, this usually means fulfilling the user’s goal as efficiently as possible. However, in real-world spoken dialogue system appl...

متن کامل

Learning cooperative persuasive dialogue policies using framing

In this paper, we propose a new framework of cooperative persuasive dialogue, where a dialogue system simultaneously attempts to achieve user satisfaction while persuading the user to take some action that achieves a pre-defined system goal. Within this framework, we describe a method for reinforcement learning of cooperative persuasive dialogue policies by defining a reward function that refle...

متن کامل

Natural belief-critic: a reinforcement algorithm for parameter estimation in statistical spoken dialogue systems

This paper presents a novel algorithm for learning parameters in statistical dialogue systems which are modelled as Partially Observable Markov Decision Processes (POMDPs). The three main components of a POMDP dialogue manager are a dialogue model representing dialogue state information; a policy which selects the system’s responses based on the inferred state; and a reward function which speci...

متن کامل

On-line Active Reward Learning for Policy Optimisation in Spoken Dialogue Systems

The ability to compute an accurate reward function is essential for optimising a dialogue policy via reinforcement learning. In real-world applications, using explicit user feedback as the reward signal is often unreliable and costly to collect. This problem can be mitigated if the user’s intent is known in advance or data is available to pre-train a task success predictor off-line. In practice...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017